Developer Source 8

home *** CD-ROM | disk | FTP | other *** search

/ Developer Source 8 / developer source - volume 8.iso / dobbs / mar97 / singf106.gif < prev next >

Graphics Interchange Format | 1997-06-26 | 27KB | 244x290 | 4-bit (16 colors)

Labels: text | letter | font | document
OCR: S St+1 S1+2 Figure 6: The program's experience consists of a trajectory through state space. At time step t, the state is s, and the agent faces a choice of actions. Note the action the agent chooses to execute at step t is a .. The reward at step t. Reward ,, is a function of s, and a ,. The next state su41 depends on s ,, a ,, and many random events such as passengers arriving at floors and pushing buttons. Reinforcement learning allows a program to use such a trajectory to incrementally improve its policy.